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ABSTRACT 

I present a new approach to recover the primordial density fluctuations and the cosmic 
web structure underlying a galaxy distribution. The method is based on sampling Gaussian 
fields which are compatible with a galaxy distribution and a structure formation model. This 
is achieved by splitting the inversion problem into two Gibbs-sampling steps: the first being 
a Gaussianisation step transforming a distribution of point sources at Lagrangian positions 
-which are not a priori given- into a linear alias-free Gaussian field. This step is based on 
Hamiltonian sampling with a Gaussian-Poisson model. The second step consists on a likeli- 
hood comparison in which the set of matter tracers at the initial conditions is constrained on 
the galaxy distribution and the assumed structure formation model. For computational rea- 
sons second order Lagrangian Perturbation Theory is used. However, the presented approach 
is flexible to adopt any structure formation model. A semi-analytic halo-model based galaxy 
mock catalog is taken to demonstrate that the recovered initial conditions are closely unbiased 
with respect to the actual ones from the corresponding TV-body simulation down to scales of 
a ^5 Mpc/h. The cross-correlation between them shows a substantial gain of information, 
being at k ~ 0.3 h/Mpc more than doubled. In addition the initial conditions are extremely 
well Gaussian distributed and the power-spectra follow the shape of the linear power-spectrum 
being very close to the actual one from the simulation down to scales of k ~ 1 /i/Mpc. 

Key words: (cosmology:) large-scale structure of Universe - galaxies: clusters: general - 
catalogues - galaxies: statistics 



1 INTRODUCTION 

The primordial density fluctuations of the Universe comprise 
all the information of the cosmological large-scale structure at 
any later time assuming that the theory of structure formation 
is known. Accurate estimates of the initial cosmic density field 
would hence also lead to a reconstruction of the formation of 
cosmic structures. This is one of the main motivations under- 
lying constrained simulations of the local Univer se (see e.g. 
iMathis et alj|2002l : iKlypin et alj|2003l : lLavauxll20ld) . The recon- 
structed density field is especially valueable as it permits one to 
study the cosmic web and perform environme n tal studies (see e.g . 



Hahn et al. 2007; Forero-Romero et al. 



Muno z-Cuartas et alj|201 1| ; IWang et al 



2< 



2009; 
2012; 



Jasche et al. 



Platen et al. 



2010; 
20 1 lb . 



Moreover, undoing the major effects of gravity on large scales has 
been shown to increase the cosmological information. For this pur- 
pose a large amount of techniques (mostly based on local transfor- 
mations) has been developed which Gaussiani se the cosmic den- 
sity field (see e. g. lNevrincketal.ll2009l,l201ll: ?;IWeinberd[l992l: 
Yu et al.ll201 ll: iNeyrinck et al.ll201 ll : Izhang et alj|201 lh . I refer to 



the Zeldovich approximation dZel'dovichl 1 19701) can already sig- 
nificantly improv e the baryon acoustic oscillation (BAO) measure- 
ments (see e.g . Eisenstein et alj 120071 ; IPadmanabhan et alj 120121 ; 
IXu et alj20ialMehta et alj2012h . 

The classical approach in the literature to solve the boundary 
problem to find the initial positions of a set of particles governed 
by the Eulerian equatio n of motion and gravity is based on the 
least action principle (see lPeebles 1989; Nu sser & Branchin i 2000; 
iBranchini et al.ll2002b - A similar approach consists on relating the 
observed positions of matter tracers (e.g. galaxies) in a geometrical 
way to a homogeneous distr ibution by minimizing a cost function 
(see e.g. lLavaux et alj|2008l) . One still needs then to find the cor- 
responding G aussian field to that point source distribution (see e.g. 
lLavaud201Ch . 

Ideally, one would wish to sample the density field S({q}) 
at the initial conditions {q} (Lagrangian positions) given the data 
at the present, in our case study, given a set of galaxies with their 
Eulerian coordinates {ccg}: 



Kitaura & Angulol (2012) for more complex nonlocal linearisation 



5({q}) ^ P(5({<l})\{XG}) : 



(1) 



schemes. In particular tracing the structures back in time within 
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where the arrow indicates the sampling process. The relationship 
between both coordinate systems is summarised by the following 
equation: x = q + ^ff, where St is the displacement which a particle 
suffers to go from its initial position q to its final position x. One 
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should note that the structure formation model is encoded in the 
displacement field vE". However, this direct approach is extremely 
complex since the PDF is highly non-Gaussian as the galaxies live 
in structures which have undergone nonlinear structure formation. 
An attempt to find a statistical formulation of such a PDF im- 
plies including the biased nature of a galaxy distribution beyond 
the Poisson statistics ( Kitaura 20 12a) and higher orde r correlation 
functions beyond the 2-point statistics (Kitaura 2012b). Such a sta- 
tistical description could be relevant to deal with primordial non- 
Gaussianities which will not be considered in this work. 

I suggest in this letter to radically simplify the problem split- 
ting it into well defined ones which encode our physical under- 
standing of structure formation on large scales in a forward ap- 
proach. In the next section, the method is presented in detail fol- 
lowed by a validation section based on tests with synthetic data. 
Finally the conclusions and discussion section is presented. 



2 METHOD 

Let us suppose that we have an unbiased sample of matter tracers at 
initial (Lagrangian) positions {q}. The problem of the reconstruc- 
tion of the primordial density fluctuations would be reduced to find 
the Gaussian field corresponding to a discrete point-source distri- 
bution. Once we knew the initial Gaussian field, we could apply a 
model of structure formation and match the structures we find in 
our simulation with the actual ones given by the observed galaxy 
distribution in a likelihood comparison procedure. In this way an 
iterative procedure can be constructed which yields a set of Gaus- 
sian fields corresponding to the initial distribution of matter tracers 
constrained on some data. The set of solutions will depend on the 
particular structure formation model and the matching criteria. 

In terms of conditional PDFs the two steps described above 
can be expressed in the following way: 

S({q}) ^ P(5({g})|M) (2) 

{q} ^ P({q}\{x G },S({q},m SF )), (3) 

where msF denotes the assumed structure formation model. 

(i) Step 1 (Eq.|2): Gaussianisation step: Sampling the Gaus- 
sian field given a set of matter tracers at the initial conditions. 

The first step can be solved using Bayes theorem. The posterior 
distribution P(S({q})\{q}) is given by the product of a Gaussian 
prior and a Poissonian likelihood (see Kitaura & EnBlin 2008, Ki- 
taura et al 2010). We assume a Gaussian prior P(S({q})\C) de- 
scribing the initial field 5({q}) which only depends on the correla- 
tion function (variance) C or power-spectrum and thus on a set of 
cosmological parameters. The likelihood P({q}\S({q}) is mod- 
eled by a Poissonian PDF describing the discrete nature of the test 
particles at their Lagrangian positions. In order to sample from such 
a posterior PDF the efficien t Hamiltonian Sampling technique is 
applied (see lJasche & Kitaurall20ld : IKitaura et alj20120 . This step 
yields Gaussian fields on a mesh with N c cells. One should note 
that similar approaches exist since long in the liter ature usually 
known under the term constrained realisations (see Bertschinger 
1987; Hoffman & Ribak 1991; van de Wevgaert & Bertschinger 



1996). 
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Figure 1. Matter statistics. Black curve represents the PDF of the overden- 
sity for the mean over 500 samples of the reconstructed initial Gaussian 
field. Shaded regions indicate 1 and 2 sigma contours. The dashed curve 
stands for the PDF of the nonlinear 2LPT reconstruction of the same sam- 
ple. The average skewness, kurtosis and mean of the initial conditions are 
also indicated by (5(<5({q}))), (/C(<5({q}))) and {M(6({q}))), respec- 
tively. 



galaxies {xg} to perform a constrained simulation. We need to 
sample the high dimensional statistical space spanned by the en- 
semble of nonlinearly evolved density fields compatible with the 
data. To find computationally feasible solutions we have to assume 
at this stage a simplified structure formation model which remains 
accurate on large-scales. Motivated by the recent findings in the 
accuracy of tracing the cosmic structures back in time and esti- 
mating the peculiar velocity fields with second order Lagrangian 
perturbation theo ry (2LPT) we have cho s en here to use this ap- 
proximation (see iKitaura & Anguldl2012L iKitaura et al]l2012h . In 
this framework we can compute the displacement field ^f(q) from 
the Gaussian field 5({q}) in an efficient way with FFTs. A random 
homogeneous Poisson distribution of N p particles is moved to the 
redshift in which the observations are present according to the pre- 
vi ously computed displacement field. Fo r technical details we refe r 
to lBuchertl Jl994l) ; lBouchet et al.l Jl995h ; lBernardeau et all J2002f) . 
We then assign a number of closest particles Np to each galaxy, 
where each particle must be closer than a certain distance d max . 
Here we have done an efficient k-d tree implementation for differ- 
ent species based on a homogeneous partition. The id's of parti- 
cles and galaxies are stored on a grid permitting us to look up fast 
which particles are in the surrounding of a particular galaxy. In this 
way each galaxy x G is assigned a set of particles {x} z answer- 
ing the question: where does the matter come from which formed 
the structures we observe today. The galaxy bias that is put in this 
method is given by the 2LPT halo bias constructed around galaxies 



(ii) Step 2 (Eq.[3): Structure matching step: Likelihood com- 
parison: Sampling the set of matter tracers at the initial condi- 
tions given an initial Gaussian field and a galaxy distribution. 

The second step requires a structure formation model linking the 
initial Gaussian field to the observed structures given by the set of 



(for a review on the halo-model see Cooray & Sheth 2002). A sim- 
ilar procedure was suggested by Scoccimarro & S heth (2002). This 
bias can be improved with a stochastic halo-model on top of the 
2LPT realisation as performed in the same work. More sofisticated 
approaches could be done base d on halo occupa tion distribution 
models (see the recent work bv lManera et al.ll2012l and references 
therein). This point needs to be further investigated. One should no- 
tice that the first step searches for Gaussian fields compatible with 
a given power-spectrum and in this way already partly corrects for 
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the bias in a fc-dependent way. As we know the initial positions 
of the particles associated to each galaxy we obtain a distribution 
of Lagrangian positions {q} which trace the initial Gaussian field. 
This closes one iteration permitting us to go to the first step again. 
In the first iteration the Gaussian field is assumed to be zero and 
thus the initial displacement field vanishes. 



3 VALIDATION OF THE METHOD 

In this section the numerical experiments are presented which have 
been performed to validate the method. As an input for our stud- 
ies we have taken a galaxy catalogue which u ses a semi-analytic 
halo-model scheme (De Lucia & Blaizot 2007) based on the Mil- 
lennium Run simulation (Springel et al. 2005) with a box of 500 
h^ 1 Mpc side. In particular we have considered a uniform subsam- 
ple of about 530 000 galaxies including all galaxy types conform- 
ing a nonlinear biased tracer of the underlying matter distribution. 
This set-up permits us to test whether our reconstructions of the 
initial conditions and the cosmic web resemble the actual ones of 
the simulation. 

We have made a great effort optimising our computer code 
KlGEN to cope with the har d computational task pres ented in this 
work (for the first version see lKitaura & Ang ulo 2012). We employ 
the Hamiltonian sampling technique in the Ga ussianisation step 
based on the ARGO-code (first and last works iKitaura & EnBlinl 
120081 : IKitaura et all I2012L respectively) and the efficient 2LPT 
structure formation model with parallel FFTs. 

To compute the Gaussianisation step (Eq. [2} and the dis- 
placement fields (Eq. [3} with FFTs we have used a mesh of 
N c =128 3 cells. Structure formation was simulated using 2LPT 
with 7Vp=384 3 particles. Only particles closer than d 
where considered to be friends of a galaxy with dL — L/N, 
being the cell side, allowing for up to Np = 50 particles associ- 
ated to each galaxy. We have tested KlGEN with N!? =1, 5, 10, 20, 
30, 50, finding a stable behaviour already with Np =20. For safety 
we chose Np =50 leaving a thorough analysis for furture work. 
We gave equal weights to those particles, but could improve the 
method considering different weights for instance according to the 
luminosity of the galaxy. We have also performed tests with fewer 
particles finding a worse resemblance with the actual density fields 
from the simulation. Nevertheless, a careful study is still to be done 
to quantify the required number and weights of particles for a given 
resolution and different galaxy type. In addition, we discarded cells 
with less or equal 3 particles as such weak constraints correspond 
to very low density regions which are not correctly captured by 
the 2LPT approximation. Going to third order could m itigate this 
problem (see discussion in lScoccimarro & S heth 2002). 

We have performed 1500 iterations and found convergence in 
the matter statistics and the power-spectra of the reconstructed ini- 
tial conditions after about 600 iterations. For safety we consider, 
however, only the last 500 iterations. 

Let us start demonstrating that the first step in our recon- 
struction method leads to Gaussian distributed fields. Fig. Q] shows 
the matter statistics of 500 samples after convergence has been 
achieved. The continuous black curve shows the mean PDF of 
the overdensity for the 500 samples of the reconstructed initial 
Gaussian field after 1000 iterations. The low skewness, kurtosis 
and mean values indicate that it is indeed Gaussian distributed, 
whereas the gravitationally evolved field using 2LPT unveils a 
highly skewed PDF (dashed curve). 

Fig. [2] shows the power-spectrum of the reconstructed initial 



max — ^ Oji-l 

1/3 



conditions in comparison to the one corresponding to the galaxy 
overdensity field and the ensemble averaged power- spectrum of 
primordial fluctuations in a A-CDM Universe with the same cos- 
mological parameters as in the Millennium Run (see lSpringel et al.l 
2005). The good agreement between the reconstruction and linear 
theory demonstrates that shot noise and gravitational nonlinear evo- 
lution have been removed. On large scales (fc < 0.2 h~ l Mpc) we 
find that the fields reveal a great resemblance with the actual linear 
power-spectrum from the first snapshot of the simulation (see left 
panel of Fig.[2j- 

A visualisation of the results is given in Fig. [3] Here we show 
how the reconstruction of the initial field leads to constrained non- 
linear density field estimates which nicely follow the structures 
traced by the galaxies. This is emphasised by the upper right panel 
in which the input galaxies are overplotted on top of the cosmic 
web reconstruction. It is also remarkable how difficult it is to rec- 
ognize the cosmic web in the initial Gaussian field (lower left panel 
in the of Fig. [3}, and nevertheless how accurately this field leads to 
the observed structures. 

To quantify this resemblance we have computed the cell-to- 
cell correlation between the initial conditions 8(q) given by the 
first available snapshot of the Millennium Run at redshift z = 127 
and the galaxy field 8 (x) in comparison to our reconstructed ini- 
tial field 5 ICC (q) after 1000 iterations (see upper panels in Fig. [4] 
for similar studies see Kitaura and Angulo 201 1). We find that the 
complex nonlinear and nonlocal relation between the initial and 
final fields including galaxy biasing is straightened to a closely un- 
biased relation with our reconstruction scheme. 

Finally we compute the propagator from the first available 
snapshot of the Millennium Run at redshift z =127 and the sam- 
ple of reconstructed initial conditions after 1000 iterations and the 
galaxy overdensity, finding that the correlation is more than dou- 
bled at k =0.3 h~ l Mpc. We can also see however, that a number 
of galaxies in low density regions, mostly living in small filamen- 
tary structures, do not match any structure of the recovered cosmic 
web. The reason is that those isolated galaxies which do not form 
groups do have a low number of particles associated to them which 
leaves them nearly unconstrained. We should also remind that the 
structure formation model we ar e using is simplified and esp ecially 
fails in low density regions (see lScoccimarro & Shethll2002[) . 



4 CONCLUSIONS AND DISCUSSION 

In this work I have presented a new approach to recover the initial 
conditions and the cosmic web structure underlying a galaxy dis- 
tribution. The method is based on sampling Gaussian fields which 
are constrained on a galaxy distribution and a structure formation 
model. 

This is achieved by splitting the inversion problem into two 
Gibbs-sampling steps: the first being a Gaussianisation step trans- 
forming a distribution of point sources at Lagrangian positions 
-which are not a priori given- into a linear alias-free Gaussian 
field. The second step being a matching procedure in which the 
set of matter tracers at the initial conditions is constrained on the 
galaxy distribution and the structure formation model we assume. 
For computational reasons we use second order Lagrangian Pertur- 
bation Theory. We demonstrate taking a semi-analytic halo-model 
based galaxy mock catalog that the recovered initial conditions are 
closely unbiased with respect to the actual ones from the corre- 
sponding TV-body simulation down to scales of a few hT 1 Mpc. 
The cross-correlation between them shows a substantial gain of 
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Figure 2. Left panel: Ratio between the power-spectrum of the sample of 500 reconstructed initial Gaussian fields after 1000 iterations P(k) and the power- 
spectrum from the first available snapshot of the Millennium Run atredshift 2 =127 gridded with nearest grid point on a mesh with 128 3 cells (P L (fc)) (black 
curve). The excess of power at high k is due to aliasing. Dashed curve: same as previous case, but with power- spectrum P(k) corresponding to the galaxy 
overdensity. Dotted curve: Same as continuous black curve, but with P L (fc) being given by the theoretical A-CDM model. Right panel: power-spectra of the 
sample of 500 reconstructed initial Gaussian fields after 1000 iterations (black curve), the galaxy sample (dashed curve), and the theoretical A-CDM model 
(dotted curve). Shaded regions indicate 1 and 2 sigma contours. 
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Figure 3. Reconstruction of the initial conditions and of the cosmic web. Upper left panel: slice abou t 4 h~ Mpc thick a veraged over 9 neighbouring slices 
through a galaxy catalogue (about 530 000 mock galaxies in a volume of 500 h^ 1 Mpc side (De Lucia & Blaizot 2007)). Each galaxy is represented by a 
red circle. Upper right panel: same slice through a sample after 1000 iterations showing the logarithm of the reconstructed nonlinear matter density field 
using 384 3 particles gridded on a 128 3 mesh with triangular shape cloud. The mock galaxies corresponding to the same slice are over-plotted indicating the 
accuracy of the reconstruction method. Lower left panel: same slice through the reconstructed initial conditions corresponding to the same sample which are 
Gaussian distributed. Lower right panel: same as upper right panel without the mock galaxies. 
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information. In addition the initial conditions are extremely well 
Gaussian distributed and the power-spectra closely follow the shape 
of the linear power-spectrum down to scales of k ~ 1 h Mpc -1 . 
The approach presented here has several advantages with respect to 
previous methods. It provides a set of joint estimates of the initial 
conditions, the cosmic web and the peculiar velocity field given a 
galaxy distribution. 

We note, that during the submission process of this letter an 
independent work has appeared on the arxiv trying to solve the 
same problem presented here using the second order Lagrangia n 
perturbation framework within Eq. HI (IJasche & Wandeltl 120120 . 
Their method is substantially different from the one presented here, 
although in its essence both try to solve the boundary problem 
of finding the initial conditions in a probabilistic way using the 
Bayesian framework. While the present approach uses an adap- 
tive particle based scheme and hereby models the collapse of La- 
gr angian regions to com pact obje cts following the i dea suggested 
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Scocc imarro & Shetbl (see 12002') and developed in lManera et al.l 



(2012), their grid-based approach relies on the Poissonian assump- 
tion to model the galaxy distribution. Furthermore their approach 
has been only tested with mock catalogues generated with such a 
2LPT-Poisson model while we test ours with a semi-analytic halo- 
model based galaxy sample based on iV-body simulations. The Kl- 
GEN-code can be immediately substituted by a different structure 
formation model, while their's is more focused on 2LPT leaving the 
redshift distortions unresolved. We note that this can be modeled in 
a straightforward way within the particle based method presented 
here. One just needs to add a second displacement to include co- 
herent flows having the following mapping relation s = q + St + 
(v ■ r)r/(Ha), where v is the full three dimensional velocity field 
(whic h can be accurately computed within 2LPT, see lKitaura et al.l 
|2012|), f is the unit sight line vector, H the Hubble constant and a 
the scale factor. Finger s-of-god could be model ed with a dispersion 
term (see discussion in iKitaura & EnBlin 2008). The KlGEN-code 
is flexible and could be extended to be more precise by includ- 
ing a more detailed structure formation description like the one 
provided by full TV-body simulations. This will imply, however, 
a high computational cost. Although the results seem to indicate 
that this approach could be useful for power- spectrum reconstruc- 
tions as well, one should be cautious since the power-spectrum 
is included as a prior in the Gaussianisation step. This assump- 
tion could be relaxed including an additional Gibbs-sampling step 
for power-spectrum estima t ion as proposed by [Kitaura & Enfilinl 
j2008h : IJasche et alj feOloT) ; I Kitaura et alj 120121) . Note that the as- 
sumed power-spectrum did not include any kind of wiggles. The 
features in the power-spectra of the recovered fields show how- 
ever a great resemblance with the actual one indicating that this 
method could be useful for BAO reconstruction. Redshift space dis- 
tortions caused by the proper motions of galaxies can be treated in 
the same framew ork as we show in a subsequent work (see dis- 
cussion above and I Kitaura et al J 20 121 . for a similar approach). The 
scatter between the recovered initial conditions and the actual ones 
could be reduced by improving the resolution and the constraints, 
for instance giving different weights to the galaxies according to 
their luminosity, or using a compiled halo catalogue as an input. 
This framework could also permit one to include a light-cone and 
consider hereby evolution effects by producing different snapshots 
at several redshifts. The forward approach is also especially robust 
as it is calibrated with the input data in each iteration. 

In summary, the method presented here provides a very pow- 
erful approach to the inversion problem of density field reconstruc- 
tion of the initial conditions and the present cosmic web. 




Figure 4. Cell-to-cell correlation after Gaussian smoothing with radius 5 
h~ 1 Mpc between: the initial conditions 8(q) given by the first available 
snapshot of the Millennium Run at redshift z =127 (left panel:) and the 
galaxy field S G (x), (right panel:) and the reconstructed initial Gaussian 
field after 1000 iterations 8 TCC (q). The dark colour-code indicates a high 
number and the light colour-code a low number of cells. 
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